The Life and Death of Discourse Entities: Identifying Singleton Mentions
نویسندگان
چکیده
A discourse typically involves numerous entities, but few are mentioned more than once. Distinguishing discourse entities that die out after just one mention (singletons) from those that lead longer lives (coreferent) would benefit NLP applications such as coreference resolution, protagonist identification, topic modeling, and discourse coherence. We build a logistic regression model for predicting the singleton/coreferent distinction, drawing on linguistic insights about how discourse entity lifespans are affected by syntactic and semantic features. The model is effective in its own right (78% accuracy), and incorporating it into a state-of-the-art coreference resolution system yields a significant improvement.
منابع مشابه
Identification of singleton mentions in Russian
Аннотация This paper describes a pilot study of the problem of detecting singleton mentions in Russian texts. A noun phrase is considered a singleton mention if it is the only referent of some entity. We discuss various morphosyntactic and lexical features, some of which were used for analogous tasks for English and propose new features derived from the discourse analysis. Testing the machine l...
متن کاملA Comparative Analysis of Self-Mentions in Applied Linguistics PhD Dissertations Written by Native and Non-Native English Writers
The purpose of the present study was to compare the PhD dissertations written by native and nonnative English writers in the field of Applied Linguistics with regard to the use of self-mentions. To this end, 40 Applied Linguistics PhD dissertations (20 written by native English writers and 20 by non-native English writers), were selected randomly among academic texts written in 2007-2017. The p...
متن کاملEntity-based Coreference Resolution combined with Discourse-New Detection
Anaphora and coreference resolution is a well-studied topic in NLP research, allowing a deeper understanding of the text than shallow methods by revealing discourse structures. Traditional systems reason over mentions, rather than entities, and perform clustering after the resolution process. In this work, general drawbacks with this approach are considered and related works employing knowledge...
متن کاملSingleton Detection using Word Embeddings and Neural Networks
Singleton (or non-coreferential) mentions are a problem for coreference resolution systems, and identifying singletons before mentions are linked improves resolution performance. Here, a singleton detection system based on word embeddings and neural networks is presented, which achieves state-of-the-art performance (79.6% accuracy) on the CoNLL2012 shared task development set. Extrinsic evaluat...
متن کاملOne Vector is Not Enough: Entity-Augmented Distributed Semantics for Discourse Relations
Discourse relations bind smaller linguistic units into coherent texts. Automatically identifying discourse relations is difficult, because it requires understanding the semantics of the linked arguments. A more subtle challenge is that it is not enough to represent the meaning of each argument of a discourse relation, because the relation may depend on links between lowerlevel components, such ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013